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Almost all modem imperative programming languages include operations for dynamically manipu- 
lating the heap, for example by allocating and deallocating objects, and by updating reference fields. 
In the presence of recursive procedures and local variables the interactions of a program with the 
heap can become rather complex, as an unbounded number of objects can be allocated either on the 
call stack using local variables, or, anonymously, on the heap using reference fields. As such a static 
analysis is, in general, undecidable. 

In this paper we study the verification of recursive programs with unbounded allocation of ob- 
jects, in a simple imperative language for heap manipulation. We present an improved semantics for 
this language, using an abstraction that is precise. For any program with a bounded visible heap, 
meaning that the number of objects reachable from variables at any point of execution is bounded, 
this abstraction is a finitary representation of its behaviour, even though an unbounded number of 
objects can appear in the state. As a consequence, for such programs model checking is decidable. 
Finally we introduce a specification language for temporal properties of the heap, and discuss model 
checking these properties against heap-manipulating programs. 

1 Introduction 

One of the major problems in model checking recursive programs which manipulate dynamic linked 
structures is that the state space is infinite, since programs may allocate an unbounded number of objects 
during execution by updating reference fields (pointers). Indeed model checking and reachability for 
such programs are undecidable, in general. Consequently to allow a restricted form of model checking 
we need to impose either some syntactic restrictions on the program [7| or some suitable bounds on its 
model. A natural bound for model checking programs without necessarily restricting their capability of 
allocating an unbounded number of objects is to impose constraints on the size of the visible heap 
The visible heap consists of those objects which are reachable from the variables in the scope of the 
currently executed procedure. Such a bound still allows for storage of an unbounded number of objects 
onto the call-stack, using local variables. 

In this paper we introduce a method for model checking sequential imperative programs with pointers 
and recursive procedure calls. In order to allow implementation of model checking of unbounded object 
allocation in the context of a bounded visible heap, we introduce a new mechanism for the generation 
of fresh object identities which allows for the reuse of object identities and which includes a renaming 
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scheme to resolve possible resulting name clashes. We introduce a formal operational semantics based 
on this mechanism for an abstract programming language, called Shylock. Subsequently we introduce 
a logic for reasoning about properties of the heap, where we use atomic propositions defined as regular 
expressions in what is basically a Kleene algebra with tests |[T3l . Namely, the global and local variables 
of a program are used as nominals, whereas the pointers (reference fields) constitute the set of basic 
actions. 

Our renaming mechanism allows a different kind of reuse of object identities than usual garbage 
collection techniques. A garbage collector typically reuses object identities from the heap and considers 
objects on the call stack as still in use. In contrast our technique is more tailored towards model checking, 
and as such, we need to reuse as much object identities as possible to guarantee a representation of pro- 
gram behaviour in terms of a finite pushdown system with a finite stack alphabet. In fact, our mechanism 
allows to reuse objects allocated in the call stack, that may become active when procedures return. 

Structure of the paper In the next paragraph we briefly discuss related work. We introduce Shylock 
and its formal semantics in Section|2] In Section[3]the abstraction of this semantics is introduced, together 
with a proof of its correctness. Then in Section[4]we define a logic for temporal properties of heaps, and 
finally in Section [3] we conclude. 

Related work We introduce a novel technique for resolving name clashes in the context of reuse of 
object identities. It is based on the concept of cut points as introduced in 1 19] to support static analysis 
via abstract interpretation techniques. Cut points are objects in the heap that are referred to from both 
local and global variables, and as such are subject to modifications during a procedure call. Recording 
cut points in extra logical variables allows for a precise abstract execution of the program, which in case 
of a bound on the visible heap can be represented by a finitary structure, namely that of Si finite pushdown 
system. 

In m a language is studied with the same features as our Shylock programs extended with a bounded 
form of concurrency. Because concurrency is an orthogonal dimension to the vertical growing of the 
number of objects due to recursion and the horizontal growing due to the anonymous field update, we 
have decided not to incorporate it in our Shylock language. In fact, bounded concurrency could easily be 
handled with a technique similar to the one used in L4J. The novelty of our work is not in the decidability 
result, which is indeed similar to that obtained in H, but in the technique we used to obtain it. While H 
uses finite graphs and graph isomorphisms to represent heaps and avoid name clashes, respectively, our 
approach is purely symbolic, and, therefore, directly usable for model checking temporal properties of 
heaps. We discuss the relationship with [4| in more detail in the final paragraph of Section |4] 

Currently there are several model checkers for object oriented languages. Java Path Finder llT2l is 
basically a Java Virtual Machine that executes a Java program not just once but in all possible ways, 
using backtracking and restoring the state during the state-space exploration. Even if Java Path Finder is 
capable of checking every Java program, the number of states stored during the exploration is a limit on 
what can be effectively checked. As with JCAT |8|, Java source code can be translated into Promela, the 
input language of SPIN. Since Promela does not support dynamic data structures, fixed-size heaps and 
stacks have to be allocated. 

Bandera [6] is an integrated collection of tools for model checking concurrent Java software using 
state-of-the art abstraction, partial order reductions and slicing techniques to reduce the state space. It 
compiles Java source code into a reduced program model expressed in the input language of other existing 
verification tools. For example, it can be combined with the SAL (Symbolic Analysis Laboratory) model 
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checker ifTTl that uses unbounded arrays whose sizes vary dynamically to store objects. In order to 
explore all reachable states model checking is restricted to Java programs with a bounded (but not fixed 
a priori) number of created objects. 

TOPICS |[T4l [m is a tool which aims to find certain types of bugs in non-recursive C programs 
which manipulate a restricted type of heaps containing only single-linked lists. The faults detected by 
TOPICS are of several types, namely: memory leak, segmentation faults, array out of bounds errors, 
and usage of undefined objects in tests. The method used for bug-detection is based on reachability 
analysis on counter machines. The transformation of a C program into a counter machine goes through 
the intermediate representation of pointer machines which abstracts away the contents of the cells of the 
linked lists. 

The problem of reusing object identities has already been faced when defining semantic models 
for the pi-calculus, most notably in history-dependent automata |16J, a model based on the theory of 
named sets capable of finite-state verification of processes that can allocate fresh resources |l5l. Model 
checking of a possibly unbounded number of objects with pointers but for a language with a restricted 
form of recursion (tail recursion) and no block structure has been studied using high level allocation 
Biichi automata L9J that allow for a finite state symbolic semantics very similar to ours. Full recursion, 
but with a fixed-size number of objects is instead considered in jMoped lHOl . using a pushdown structure 
to generate an infinite state system. 

The techniques described in this paper aim at verifying programs by model checking. This is fun- 
damentally different from other tools and techniques for verifying programs manipulating the heap by 
deductive verification methods, such as separation logic lITSl . Automated methods for proving annotated 
programs is a very active area of research (see e.g. |[T5l l2l[3l). For a more detailed discussion we refer 
to ||31. 

Acknowledgments We would like to thank the anonymous referees for their extensive comments and 
suggestions that greatly improved the presentation of our work. The research of Jurriaan Rot has been 
supported by the Dutch NWO project CoRE. 

2 Shylock: a language to manipulate the heap 

In this section we introduce Shylock, a simple imperative programming language that allows us to focus 
on dynamic pointer structures in the context of recursive procedures with local variables. Programs 
consist of a set of recursive procedures that can create new objects, and store them into their local or 
global variables. Besides being dynamically allocated, objects can be referenced by other objects via 
object fields, and exist as long as they are reachable in the heap from some other object or from a 
variable. To simplify the presentation objects are the only data structure of Shylock. 

We assume an infinite set V of variables ranged over by x,y, including a finite collection G of global 
program variables {gi jEii ■ ■ ■ ^Sn 

}, and a disjoint finite set L of local program variables {Zi , /2, • • • , hn}- 

We denote by C the infinite set V\{GVJL) of cut point variables ranged over by c\,C2, Further, we 

assume a distinguished element nil G G, used as a constant to refer to the undefined object. A Shylock 
program acts only on global and local variables, cut point variables will be used later in the abstract 
semantics as a kind of "freeze" variables for storing relevant points of the heap during procedure call. 
For simplicity we assume that all objects have the same set of fields F = {/i, . . . ,fk]- We denote by 
g,l,c,f the sequences of all global, local, cut point variables and fields, respectively. 
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For P a finite set of procedure names {j>o, . . . a program is a set of procedure declarations of tlie 
form Pi :: B,, wliere B,, denoting tlie body of tlie procedure p,, is a statement defined by the following 
grammar: 



Here x and y are (local or global) program variables ranging over G U L, / is a field in F and p is a 
procedure name in P. We assume a distinguished p^ € P, called the initial procedure of a program. 

The assignment statements x.f : = 3^ or x := y.f assign the identity of the object referenced at right 
hand side of the assignment to the field or variable, respectively, at the left hand side. The statement 
X := new creates a new object that will be referenced by the program variable x. All fields of x will 
reference the undefined object nil. We restrict to programs in which the variable nil does not appear 
in the left-hand side of an assignment or object creation, i.e., nil is a constant. Conditional statements 
[x = y]B and [x 7^ y]B, nondeterministic choice B\ + B2, and sequential composition Bi ,B2, have the 
standard interpretation. A procedure call p means that the body B associated with p is executed next 
on the same global state but on a fresh local state. After the procedure body terminates, its local state 
is destroyed forever and the previous local state (from which the procedure has been called) is restored. 
Changes to the global state, however, remain. 

Notice that variable assignment x := y.f and field update x.f := y suffice, as more general expres- 
sions and updates can be encoded. For example, a statement x := y.f^ . . .f^ is encoded as x := y.fi^ ;x := 
x.fi^;x := x.fi^;. ..;x:= x.f^. A basic variable assignment x :=y can be encoded as z-f '■= y',x := z-f. 
More general boolean expressions in conditional statements can be obtained by using sequential compo- 
sition and nondeterministic choice. In fact {bi Ab2)B can be written as {b\)b2B, whereas {b\ V b2)B as 
{biB) + {b2B). Negation of a boolean expression b can be obtained by transforming b into an equivalent 
boolean expression in conjunctive disjunctive normal form, for which negation of the simple expression 
[x = y] and [x ^ y] is defined as expected. Ordinary while, skip, and if-then-else statements can be ex- 
pressed easily in the language, using recursive procedures, conditional statements and nondeterministic 
choice. For the sake of simplicity, we allow creation and assignment of a single object identity only; gen- 
eralizations to simultaneous assignments and object creation can be added in a straightforward manner. 
The language does not directly support parameter passing. However, it is worthwhile to note that we can 
model procedures with call-by-value parameters by means of global variables. Let p{vi , . . . , v„) be a pro- 
cedure with formal parameters vi, . . . ,v„. We see the formal parameters as local variables and introduce 
for each parameter v,- a corresponding global variable gi (which does not appear in the given program). 
Every procedure call /^(xi , . . . , Xfi) can then be encoded by the statement ^1 : — xj ; . . . ',gn '■ — x,j : p whereas 
the body B of /^(vi , . . . , v„) can be encoded by vi := ^i;...;v„ := gn',B. A similar approach can be 
taken to model procedures with return values. Finally, method calls x.m(xi, . . . ,x„) can be modeled by 
introducing the called object x as an additional parameter of the procedure m. 

Example 1. We consider a simple example program which opens a file, passes it to some procedure 
which returns again a file, and finally tries to close this returned fil^ It consists of the procedures main, 
q, open and close, and the sets of global and local variables are G = {nil} and L = {x,y} respectively. 
The procedures main, open and close are defined as follows: 



B ::=x./ ■.= y\ x:=y.f\x := new | [x = y]B \ [x^y]B\B + B\ B; B \ p 



mam 



: open(xj; y 



q{x); close{y) 



open :: x := new 
close{z) [z / nil]z : 



nil 



This idea was suggested to us by Dilian Gurov. 
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The definition of q is left open. Recall from the above discussion that while parameter passing and return 
values are not directly in the syntax of the language, they can easily be encoded. The intuition behind 
this program is as follows. We start by executing main. Then first the program opens a file, modeled by 
an object creation, and then it passes the reference x to this file on to a procedure q. This procedure q then 
performs some calculations and passes back a file reference y. Finally we try to close the file referenced 
by y. Closing a file is modeled by first checking if the given reference is not nil, and then simply setting 
the reference to nil. If we pass close a reference to nil, then the program crashes. □ 
In order to describe the formal semantics of Shylock programs, we first formalize some relevant 
notions related to the heap. To represent object identities we use the set Nj^ = N U {_L} of natural 
numbers extended with an element _L and ranged over by n,m. Let s :V ^ N± be a variable assignment 
and h: F ^ (Nj^ — ^ Nj^) be a field assignment such that for all /, h{f){±) = ± and the set of objects 
for which h{f){n) 7^ _L is finite. A heap H is a pair {s,h) of a variable- and a field assignment. We write 
H{x) for s{x), and H{f) for h{f). For a subset of variables Var C V we denote with ^niVar) the set 
of objects reachable from objects labeled by these variables in H via any of the (functional) transition 
relations H{f), for f ^ F. Formally it is defined as the least fixpoint of the equation 

.^HiVar) = {H{x) \ x G Var} U {H{f){n) \ f e F,n e ^niVar)} 

If Var = V we denote the set of reachable objects of H by Mh- Further we define the "purely local" part 
of a heap H as = ^H{LLiC)\^H{G). Intuitively contains all objects which are reachable from 
a local (or cut point) variable, but not from a global variable. 

We denote variable update by H[x := n], global field update by H[f := (p] where (p : N_l — > Nj^ is a 
function such that <p(-L) = -L, and local field update by H[f := (p[n := m]]. We use the standard notation 
and definition of simultaneous assignments and updates. A renaming p of a subset A'^ C M_l is an injective 
function in N N such that p{n)=n for all n^N, and otherwise p (n) G A^. Clearly it has an inverse, 
denoted by p"^ Given a renaming p we define its application on a heap H as p{H){x) = p{H{x)) and 
p{Hm{n)=p{H{f){p-^{n))). 

A configuration is a tuple {H,r) where H is the current heap and F is a stack of statements and 
heaps. The head of a stack is separated from the tail by means of the right-associative operator •, while 
the empty stack is represented by e. The current statement to be executed is on the top of the stack. When 
there are no statements but an heap on the top of the stack, then a procedure returns, and the state on the 
stack has to be restored as current state. A computation is a (possibly infinite) sequence Co — > Ci — > . . . 
of transitions, where — > is a relation between configurations which we will now define by cases on the 
top of the stack. To this end let F be a stack of statements and heaps. Assignments to a variable or to a 
field update the current heap structure as expected: 

{H,x :=y./.F) {H[x := //(/)(//(j))],F) 

{H,x.f:=yr) — > {H[f:=H{f)[H{x) := //(j)]],F) 

To model object creation we assume a distinguished global "system" variable oc which is used as a 
counter, and does not appear in a program. We implicitly assume that H{oc) / ±, for every heap H. The 
semantics of the operator "new" is: 

(//,x:=new«F) — > {H[x,oc := H{oc),H{oc) + l][f := q>],r) 

where 9 is the sequence such that (pi =H{fi)[H{x) := -L]. Conditional statements are executing depend- 
ing on the evaluation of the condition: 

^(-v)=/^(.v) H{x)^.H[y) 
{H, [x = y]B • F) ^ . F) {H, [x ^ y]B • F) ^ . F) 
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Sequential composition adds on top of the stack the next statements to be executed, while nondetermin- 
istic choice selects just one of the two statements. 

{H,Bi;B2»r) ^{H,Bi»B2»r) +B2 -r) ^ .T) /G{0,1} 

Finally, procedure call and return are modeled as follows: 

{H,p»r) — > {H[l:= l],B»H»r) {H,H'»r) — > {H[l ■= H\l)],r) 

where H'(l) denotes the pointwise application of H' to the local variables /, and B is the body of the 
procedure p. Recall that for technical convenience there is a single sequence of local variables /" shared 
by the procedures. 

This section is concluded with the notion of propemess, which is a formalization of some operational 
properties of configurations appearing during the execution of programs. Suppose H and H' are heaps 
which appear in the stack, meaning they are pending heaps from a procedure call, and H appears higher 
up in the stack than H' (so H' was put on the stack before H). Then (1) the structure of the purely 
local part of H' is preserved in H, since it could not have been accessed in between. Moreover (2) if 
some object is reachable in both H and H' , then it must be reachable from a global variable in H' . Both 
conditions intuitively capture the property that reachable objects remain reachable in a recursive call only 
if they were already reachable from global variables. 

Definition 1. The set of proper stacks is defined inductively as follows: 

• the empty stack is proper 

• if r is proper then B • F is proper for statements B 

• if F is proper and // is a heap such that for every H' occurring in F the following holds: 

- for all / G F, « G ^^,: H{f){n) = H'{f){n) 

then // • F is proper. 

A configuration (//, F) is proper if // • F is a proper stack. 

For example every configuration {H,po) is proper. Further, all transition steps preserve proper con- 
figurations. Thus every configuration in a computation starting from a proper one is proper. 

3 Improving the semantics 

The semantics introduced may generate, for a given program, a transition system with infinitely many 
configurations. This is not only because of the unbounded stack size, but, more problematically, also 
because each time a new object is allocated a new natural number is used. Thus, the number of heaps 
needed is also unbounded. Consider for example the Shylock program consisting of a single procedure 
p with as body the statement 

X := new; p 

where ;c is a local variable. Each time the statement x := new is executed, a new natural number is 
assigned to x. This has the unfortunate consequence that infinitely many heaps are needed, and thus 
the usual model checking techniques for recursive systems |[TOl 1211 cannot be guaranteed to terminate. 
In this section we introduce an abstract semantics for Shylock, based on reuse of natural numbers for 
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objects. Identities of objects that are not in use in the current heap will be reused instead of using a new 
identity each time a new object is allocated. More concretely, when creating a new object we will choose 
the minimal unused identity available, as formally expressed by the following rule: 

(77,x:=new»r) — > {H[x := n][f := (p\,T) 

where n = min(N\^/f), and, for each i, (pi = H{fi)[n := ±]. The intuition here is that numbers are no 
longer concrete object identities, but instead they represent equivalence classes. However, this adapted 
rule may introduce name clashes with objects in the local state pending on the stack. We illustrate this 
problem and our solution with an example. Consider the following heap: 

/ 

(7Toy^^^(£Tiy-^^(^^ 

Here x : n represents the identity n to which the variable x refers (so technically the figure represents a 
heap H for which H{1) = 0, H{f){0) = 1, etc.). Further / is a local variable, g is a global variable, and 
/ is the single field. Let us consider first the execution of a call to a procedure p :: g := new. Starting 
from the above heap, on the call a copy is placed onto the stack and the local variable / is initiaUzed to 
_L, so the procedure p is executed on the following heap: 



/ 




When executing g := new we take for g the minimal object identity unreachable from the current vari- 
ables, which is 0. Then, on procedure return, we see that there is a name clash: both g and / point to the 
object with identity 0, while they should obviously not be identified. A solution is to rename the object 
n to which g points, i.e., to make g point to an identity m which is used neither by the current nor the 
caller's stored heap, and updating the fields of this new object according to the fields of n. Then we can 
just take the union of the global part of the new heap, and the local part of the stored heap. For example 
we could rename, in the current heap, the object 1 to 2, which is free, take the union with the (local part 
of the) stored heap of the caller, and combine the two heaps as follows: 



/ 




However, consider now the execution of a procedure p' :: g := new; g := new, starting from the same 
heap as before (the first figure above). After executing the first object creation statement in the procedure, 
g again points to 0. But then after the second time we execute g := new, g is assigned the minimal index 
available, which is 1 at that point. Thus on procedure return, the heap is exactly the same as in the 
beginning of the procedure execution. So two object creations are in this case indistinguishable from no 
creation at all. On the procedure return, when combining the current heap with the stored heap it is thus 
not clear whether g should keep pointing to 1 (when no object creation statements were executed), or if it 
should be renamed to a new identity separate from the others (when two object creation statements were 
executed). 
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Our solution to this problem is a non-trivial extension of the semantics of procedure call and -return 
based on the identification of so-called cut points, which are the object identities at the "edge" of the 
global and the local part of the heap, representing exactly the point where the local part "enters" the 
global part. On a procedure call, these cut points are identified, and we assign their values to a set of 
distinguished cut point variables, in the heap of the callee. Then, on procedure return, the cut point 
variables "connect" the current heap with the stored one, giving us precisely the information about how 
to combine the two. Returning to our example, consider the following heap: 

/ 

This heap represents the initial heap of the callee, extended with the only cut point of the caller heap, in 
the form of the cut point variable c. Now if we execute g := new;g := new, the global variable g is not 
assigned the identity 1 since it is already in use. So now on procedure return we can distinguish between 
the case that g was newly created (in which case it will have a new identity), and the case that it was not 
(in which case it will have the same identity as before). 

Formally, for a given heap H, the set CPh of cut points is defined as follows: 

{G) n{H{LUC)UF{^)) 

where ^(A'^) = {H{f){n) \ n 6 N,f 6 F}. Further, H{y) means applying H point-wise to V, hence 
H{LUC) = {H{v) I V G LU C}. Note that the definition involves the cut point variables; recall from the 
above discussion that these variables represent the cut points of the previous heap. Further recall that 
£Sh{G) is the global part of the heap, while M'^ is the "purely local" part of the heap. Intuitively, F{^fj) 
represents the objects which are pointed to by a field from an object which is purely local. Further 
//(LUC) n^ff (G) is the set containing objects pointed to directly by the local variables, which are also 
reachable from global variables. On the other hand, F{£?^) r\^H{G) contains objects adjacent to the 
purely local (reachable) nodes (where the node adjacency is provided by the field transitions). 
Now the procedure call of the improved semantics is modeled by the following rule. 

(//, Pi . r) ^ (// [v := 1] [c : = «] , B; . // . r) , 

where v is the sequence of local variables and cut point variables c for which H{c) ^ _L. Further c is a 
sequence of cut point variables of the same length as the sequence n of cut points CPh- Note that, given 
a heap He, if on top of the stack we have a heap Hi, by the way we modeled the procedure call, the cut 
points in the stacked heap Hi correspond exactly to the cut point variables in the current heap H^. 

We proceed to discuss the construction of the return heap, say Hy. We first rename all objects of the 
purely local part of Hi which conflict with He, meaning that they are also reachable from global variables 
in He. When this is done, we can just copy all the local variables directly to He and update the fields of 
the purely local part of Hi in He. In order to formalize this process we define the set of name clashes N 
of He and//;: 

N = M'ij,r\.^.HXG) 

Remember that only contains the objects which are reachable from a local variable (or from a 
previous cut point variable), and are not reachable from any global variable. Now the return rule is 
formalized as follows: 

{He,Hi*T)^{eop{He),T) 

where 
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• p is a renaming, mono tonic on A^, such that p (« ) G N \ ^^.^ ijiineN, and p{n)=n otherwise, 
and p is minimal w.r.t pointwise comparison between renaming functions 

• resets the purely local part: 

e{H)=H[l,c:=Hi{l),Hi{c)][f:= q>] 

where c is the sequence of cut point variables c for which Hi (c) / ±, and (p is the sequence defined, 
for all n G Nx, as follows: 

]Hc{fi){n) otherwise 

Correctness We provide a proof of the equivalence between the concrete semantics of Shylock, and 
the abstract semantics defined in the previous section. To this end we adapt the concrete semantics to take 
into account the initialization and restoration of the cut point variables, similar to the abstract semantics. 
Note that this does not affect the behaviour of programs, as they are assumed not to contain cut point 
variables. First we need the basic notion of heap isomorphism: 

Definition 2. Two heaps H and H' are isomorphic, denoted H ^ H', if there exists a function a : 
^H' such that 

• a is a bijection. 

• For eachx G V: a{H{x)) = H'{x). 

• For each feFaadne^n: a{H{f){n)) = H'{f){a{n)). 

Note that since fields are deterministic, such a function a, if it exists, is unique. In order to proceed, 
we introduce the important notion of cut point identification. Recall that on a procedure call, the new 
heap He represents in its cut point variables the cut points of the caller heap Hi. Suppose now we 
have other heaps H'^ and H[ such that He ^ H'^ and Hi ~ H'l. Note that cut points are preserved by 
isomorphisms. Cut point identification now formalizes the representation of cut points of Hi and H'l in 
He and H'^ respectively, using for each cut point in Hi and H'^ the same variable to represent it in He and 

H'e. 

Definition 3 (Cut point identification). Let He,Hi,H^,H'[ be heaps such that He H'e^ Hi ~a; H[. Let 
{«! , . . . , nk} = CPh, be the cut points of Hi. We define 

{He,Hi) M {H'„Hi) 

iff there exists a sequence of cut point variables ci , . . . , such that for all i < k: 

Hc{ci) = ni and //'.(c,) = «/(«;) 

From this definition we immediately deduce that the two isomorphisms agree on cut points: 
Corollary 1. If {He, Hi) cxi {H'^,Hi) then for all n G CPh,: 0Ci{n) = a^n). 

Proof For all «/ G CPh, we have «/(«,) = H'^{ci) = tteiHeici)) = aeim). □ 

Now we are ready to introduce a strong notion of equivalence, based on heap isomorphism, which 
also takes along the main operational properties characterizing configurations appearing in computations. 

Definition 4. Given stacks F, F' we define F ~ F' inductively as follows: 



108 



Interacting via the Heap in the Presence of Recursion 



• if r and F' are both empty then F ~ H 

• if F~F'thenB«F~S«F' 

• if 

- F ~ F', 77 ~ 77' 

- (77,V(F))M(77',V(F')) 

- 77 • F is proper 
then77»F~77'»F' 

where v(F) extracts from the stack the top heap. Now for configurations we define (77, F) ~ (77', F') iff 
77«F~7/'«F'. 

The following lemma states how the current heap and the stacked heap are combined on a procedure 
return in the concrete semantics. More precisely it expresses that any identity which becomes reachable 
right after a procedure returns, is in the purely local part of the heap of the caller procedure. 

Lemma 1. Suppose {He, Hi • F) ~ {H'e,H'i • F'). Let Hr = Hc[l := 77/([)]. Then for all n e if 
n ^ {G U C) then n G 

Proof. Let Hc,Hi,Hr be as above. Let n G and assume n (GU C), so n is reachable in Hr from 
a local variable. We prove by induction that any path in 77^ reaching such an n is reflected in the same 
path in 77/ which lies entirely in its purely local part . 

Suppose first that n = Hr{l) for some local variable /. Then n = Hr{l) = Hc[l := Hi{l)]{l) = Hi{l). 
Now by assumption (that is in the relation ~), 77c has cut point variables precisely on the cut points of 
77;. We may then conclude that n is not reachable from a global variable in 77/; otherwise it would, by 
definition (of cut points) be on a cut point of 77/, and consequently on a cut point variable in 77c which 
contradicts our assumption on the reachability of n. Thus n G . 

Now let n = Hr{fi) o . . . o 77, (/^ )(//,(/)) such that n ^^^^(GU C), n = Hiif) o . . .oHi{f,){Hi{l)) 
and n G . Suppose 77^ (/) (n) (G U C) for some field /. Since n G and 77c • Hi is proper we 
have Hi{f){n) = Hr{f){n). Now Hr{f){n) ^ Hc{c) for all cut point variables c; otherwise n would be 
reachable from such an 77c(c) which would be a contradiction with our assumption. But then by the cut 
point identification of 77c and 77/, 77/ (/) (n) is not on a cut point of 77/, which implies that the global state 
has not been entered yet, i.e., 77/ (/)(«) G SSjj^ as desired. □ 

We are now ready for the main theorem of this section, stating that the concrete and the abstract 

semantics are equivalent. 

Theorem 1 (Bisimulation). Let Ci and C2 be configurations such that Ci ~ C2. Denote with — t-c and 
— >a the transition relations corresponding to the concrete and the abstract semantics, respectively. If 
C\ — >c C[ then there exists a configuration €'2 such that C2 — C'2 and C[ ~ C2, and vice versa. 

Proof. We only discuss the isomorphism of the resulting heaps on procedure return. Suppose 

(77c,77/.F)~(77;,77/'.F') 

By definition of the concrete and the abstract semantics from these respective configurations the enabled 
transitions are 

(77c,77/ • F) ^c {Hr, F) and (77^77/' • F') ^„ (77;, 1") 
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where H'^ = dop {H'^). By definition of ~ tliere are isomorphisms He ~a^. H[. and Hi H'j. We explicitly 
define an isomorphism a : — ^ ^h'^ as follows: 



a{n) 



ai (n) otherwise 



Note that by Lemma [T| if « ^//^(GUC) then n G so a is well-defined. To see that a is an 
isomorphism, intuitively, note that is an isomorphism on {G U C) and «/ is an isomorphism on 
and by cut point identification we known from Corollary [Tjthat ac{n) = ai{n) for all n G CP//,. □ 



4 Model checking Shylock programs 

In this section we present a framework for model checking Shylock programs. We first turn our abstract 
semantics into a pushdown system, then we introduce a linear time temporal logic for heap structures, 
and finally we shortly recall the actual model checking procedure. 



Programs as pushdown systems A pushdown system can be considered as a pushdown automaton 
without an input alphabet. Formally a pushdown system ^ is a triple (A, £,!—)■) where A is a set of 
control locations, Lisa stack alphabet, and i— is a subset of ( A x £) x (A x £* ) representing the set of 
rules. A pushdown system is said to he finite when the above three sets are all finite. 

The behaviour of any Shylock program P = {po :: Bo,...,pi :: B/} can be represented by a push- 
down system = (A, £,!—)•), where A is the set of all heaps, and £ = AUc/(P) U {Z}, where Z is an 
element which does not occur in A and cl{P). Here cl{P) is the set of all possibly reachable statements 
in P, and it is defined as the union of all cl{Bi), for all <i <l, with cl{B) given inductively by: 

cl{x.f := y) = {x.f := y] cl{x := y.f) = {x := y.f] 

cl{x := new) = {x := new} cl{p) = {p} 

cl{[x = y]B) = {[x = y]B]VJcl{B) cl{[x i-y\B) = {[x ^y]B]Ucl{B) 

cl{Bi+B2) =cZ(Bi)UcZ(B2)U{Bi +B2} cl{Bi;B2) = cl{Bi)Ucl{B2) 

The rules of the pushdown system are specified using the abstract semantics as follows: 

{H,y)^{H',w) iff {H,Y»r) — >{H',wr) 

where H ranges over heaps. Further we add rules (//,/) 1— )• {H,Z) for any configuration (//,/) which 
does not have outgoing transitions to complete with stuttering steps terminating computations starting 
from {H(),pq»Z). Because there are infinitely many heaps, the pushdown system constructed above will 
in general be infinite. Consequently existing model checking techniques can not be applied. In order to 
allow model checking we consider a subclass of programs. First, we need the following definition: 

Definition 5. A heap H is ^-bounded if |=^//(GUL)| < k. A computation (//cFo) — ^ (//i,ri) —?■... 
(where the transition steps are according to the abstract semantics) is ^-bounded if |^//, | is A;-bounded 
for all /. A program P with main procedure po is ^-bounded if every computation {H,po) —)•... is 
^-bounded. 



Example 2. As an example of a program which is bounded in this sense, recall the program with a single 
procedure defined as p :: x := new; p, where x is a local variable. Indeed only one object identity is 
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needed to represent the object to which x refers, so this program is 1 -bounded. Nevertheless, since x is 
local, during the execution of the program an unbounded number of objects are stored on the stack. 

For another example, recall the program from Example[T]which opens and closes a file. This program 
is ^-bounded iff the visible heap during execution of the procedure q{x) (with x a fresh object) is k- 
bounded. □ 

Now if we restrict to ^-bounded programs, the abstract semantics, because of its reuse of object 
identities, allows us to represent the behaviour of a program as a pushdown system as above, but using 
as control states only /:-bounded heaps. By Theorem [T| we then have precise abstractions of /c-bounded 
programs as finite pushdown systems. More precisely, given a ^-bounded program P we define the 
pushdown system k-^p = (A/., !—)•,(.) obtained as a restriction from as follows. First, A/^ = {H \ 
\^h{GL) L)\ < ^} U {T} is the subset of all ^-bounded heaps. The stack alphabet is given by Aj. U 
cl{P) U {Z}, and the relation i-^t is the restriction of i— to together with the two out-of-bound rules 
below 

Note that in fact for any program P, k-^p is a finite pushdown system. However it is a precise abstraction 
of P only if P is fc-bounded. 

Specifications in UTfU In order to do a precise pointer analysis of Shylock programs, we introduce a 
linear time temporal logic {L'TfU) for describing the evolution of the heap structure. We first introduce 
the language of the properties satisfied by a given heap, which will form the atomic properties of the 
linear temporal logic. We do so by the introduction of expressions of the Kleene algebra with tests [ Vij 
over fields and variables. More precisely, let Rite be the smallest set defined by the following grammar: 

r : := e I X I -ix I / I r.r I r + r I r* 

where x ranges over variable names (to be used as tests) and / over field names (to be used as actions). 
The regular expressions introduced by Rite are similar to the heap patterns used in matching logic f20l 
and separation logic [18|. We define a transition relation n — m between objects of a heap H as the 
least relation such that 

e 

n — )•// n 

n^Hfi if H{x)=n 

n —^H n if H{x)^n 

n —>H fn if H{f ){n)=m 

n >H m itn — > mor n — >m 

n ^^^—^H ni if exists an object «' such that n n' and n' >n 

n ni if either n = mor there exists an object n' such that n A// n' and n' >n 

Further we introduce the following modal interpretation of regular expressions: 

H \= rif and only if for each reachable object n G there exists m such that n -^h nt. 



(Note that this coincides with the truth definition of H \= {r)true in dynamic logic.) For instance, the 
regular expression first. next* .last + -tfirst is satisfied by a heap H if and only if the object referred to by 
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the variable ^r^f is linked via a chain of fields next with the object referred to by the variable last, or first 
is not allocated. 

Our L'TfLf formulas are built according to the following grammar: 

(j) ::= true | r | -i0 | 0i A 02 I X0 | 0iU02 

where r ranges over Rite. Other propositional connectives V, — )• are defined in terms of A and Further 
we define F0 = trueU^ and G0 = -iF(-i0). We define Af(0) to be the set of atomic propositions r G Rite 
which appear in 0. Clearly At{<p) is always a finite set. 

For a set X we denote its powerset by 2^. With (I'^'W)"' = {woWiW2 ■ • • \wi e 2^'^'^^ for all / > 0} 
we denote the set of infinite words over sets of expressions in At{<t>). Given such an infinite word 
w = wqw\W2 ... we denote with w, the /-th element, and with w[i. . .] the subsequence w,w,+i . . . starting 
from the /-th element of w. For a L'TfU formula and an infinite word w G we denote that w 

satisfies by w |= . This satisfaction relation is defined inductively on the structure of according to 
the standard semantics of LTL yj : 



w 


= true 




w 


= r 


iff r G w 


w 


1= 01 A 02 


iff w = 01 and w ^ 02 


w 


l=-0 


iffw^0 


w 


|=X0 


iff w[l...] 1=0 


w 


1= 01^/02 


iff 3j> 0.w[j ...] = 02 and >v[/ . . .] \= 0i for all < / < j 



Let 71 be an infinite sequence of ^-bounded heaps for some k, i.e., 7i G {HqH\H2 ■ ■ ■ \Hi G A<- for all / > 0}. 
Intuitively, n represents a trace of heaps which we encounter during a particular computation of a k- 
bounded program. We say that tt |= if and only if there exists a sequence w G (2'^'^'^^)® such that 
w 1= and for all / > 0: 

TCi 1= r for all r G w,- 

Finally the above relation |= is pointwise extended to sets of infinite sequences of ^-bounded heaps. 



Model checking Recall that a Biichi automaton ^ = (V,A, -w, Qo,F) consists of a finite set of states 
V, an input alphabet A, a transition relation -wC V x A — )■ V, a set of initial states Qo and a set of final 
states F. The language (^) accepted by =^ is the set of all infinite words w over A such that there is 
an infinite path via -w labeled by w, starting from a state £ Go, and visiting an accepting state in F 
infinitely often. Given a L'TfLf formula 0, one can effectively construct a Biichi automaton which 
recognizes exactly the set of words w over sets of expressions in Af(0) satisfying (see e.g. lUl for 
details). 

Let be a formula of our temporal logic for heaps, and P be a ^-bounded Shylock program. To 
check if P satisfies the formula 0, we have to check if all the computations of the pushdown system 
k-!^P starting from the initial configuration {Ho,po»Z) satisfy 0. This amounts to synchronizing the 
pushdown system k-^p with the Biichi automaton and checking if the resulting Biichi pushdown 
system has an accepting run, i.e., a run starting from the initial states of the two systems which visits 
infinitely often configurations whose control locations projected into the states of the Biichi automaton 
are final HI ED. 

The problem of finding an accepting run of a Biichi pushdown system can be reduced to that of 
finding a repeated head reachable from the initial configuration fT0ll2Tl . Computing the repeating heads 
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is typically developed in two phases. In the first phase, one constructs the head reachability graph ^ 
associated with a Biichi pushdown system, while in the second phase, ^ is analyzed to identify those 
nodes of the graph which are repeated heads. To avoid redundant computations, it suffices to construct 
the head reachability graph ^ restricted to those configurations reachable from an initial configuration 
of the Biichi pushdown system. This can be done using forward reachability analysis, by using the so 
called post* method. For more details, see, e.g.. Chapter 3 in 121]. 

Shylock and L4J As discussed in the introduction, the related work closest to our approach is presented 
in H. While H considers only reachability, this could be extended to a full model checking procedure 
along the lines discussed in this paper. A first view on the two works shows that has a very high level 
solution to the problem while we give an explicit solution to it. The semantics of procedure call and 
specifically of the procedure return as given in [4] are stated in terms of abstract graph isomorphisms, 
and it is not clear how they should be implemented. In contrast, in this paper we give a purely symbolic 
characterization of procedure call and return. 

Now we present a more detailed view on the difference between the semantics given in the two 
works. The procedure call in [4J employs the cut point mechanism as well, but it also cleans the heap of 
the currently unreachable objects. Hence, their procedure call passes to the callee only the strictly visible 
heap of the caller, actually an isomorphic instance of it. In contrast, Shylock's procedure call relies on 
the cut points as well, but it doesn't necessarily clean the heap (though it could). Instead, Shylock reuses 
object identities on demand, during the object creation statements of the procedure, while [4| doesn't pay 
attention to the body of the procedure because of the initial cleaning. Though Shylock may seem lazy 
w.r.t. cleaning, it's memory reuse mechanism acts in fact as a localized cleaning. This pays off during 
the execution of the procedure returns. Namely, in [4] the procedure return has to proceed by renaming 
the entire visible object space, such that it can synchronize the current heap with the caller's heap. 
Meanwhile, Shylock renames only the name clashes, i.e., the objects at the intersection of the caller's 
global heap and the callee's current purely local heap. These differences induce a different reasoning 
during the verification phase. Namely, while Shylock can afford to use heap equality during the model 
checking phase, the reachability procedure in |4| has to be performed on normal forms of the heaps (i.e., 
on the representatives of the graph isomorphic equivalence class). We are not sure if Shylock maintains 
strictly one representative of each isomorphic class, but we plan to study this particular aspect in the near 
future. 

5 Conclusions 

In the presence of recursive procedures and local variables, an unbounded number of objects can be 
allocated either on the call stack using local variables, or, anonymously, on the heap using reference 
fields. In this paper we discussed Shylock, a language which supports these features, together with 
a formal abstract semantics which allows model checking in the context of bounded visible heaps. We 
introduced a temporal logic for specifying properties of the heap, and discussed a procedure for checking 
these properties against Shylock programs. 

Future work Shylock's improved semantics has been implemented in the K framework. We are cur- 
rently implementing a general model checking technique for recursive programs defined in K, from 
which we would obtain a Shylock model checker along the lines described in this paper. Further we are 
investigating the expressive power of programs with a bounded visible heap. 
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